## [1] "Loading the following libraries using lb_myRequiredPackages: data.table"
## [2] "Loading the following libraries using lb_myRequiredPackages: lubridate" 
## [3] "Loading the following libraries using lb_myRequiredPackages: ggplot2"   
## [4] "Loading the following libraries using lb_myRequiredPackages: readr"     
## [5] "Loading the following libraries using lb_myRequiredPackages: plotly"    
## [6] "Loading the following libraries using lb_myRequiredPackages: knitr"

1 Purpose

To extract and visualise tweets and re-tweets of #dockercon for 17 - 21 April, 2017 (DockerCon17).

Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/

2 Load Data

Data should have been already downloaded using collectData.R. This produces a data table with the following variables (after some processing):

##  [1] "text"             "favorited"        "favoriteCount"   
##  [4] "replyToSN"        "created"          "truncated"       
##  [7] "replyToSID"       "id"               "replyToUID"      
## [10] "statusSource"     "screenName"       "retweetCount"    
## [13] "isRetweet"        "retweeted"        "longitude"       
## [16] "latitude"         "location"         "language"        
## [19] "profileImageURL"  "createdLocal"     "obsDateTimeMins" 
## [22] "obsDateTimeHours" "obsDateTime5m"    "obsDateTime10m"  
## [25] "obsDateTime15m"   "obsDate"          "isRetweetLab"

The table has 7,525 tweets (and 10,308 re-tweets) from 5,962 tweeters between 2017-04-16 19:01:03 and 2017-04-20 09:13:22 (Central District Time).

3 Analysis

3.1 Tweets and Tweeters over time

All (re)tweets containing #dockercon 2017-04-17 to 2017-04-20

All (re)tweets containing #dockercon 2017-04-17 to 2017-04-20

3.1.1 Day 1 - Monday (Workshops)

All (re)tweets containing #dockercon Monday 17th April 2017

All (re)tweets containing #dockercon Monday 17th April 2017

3.1.2 Day 2 - Tuesday (Main Day 1)

All (re)tweets containing #dockercon Tuesday 18th April 2017

3.1.3 Day 3 - Wednesday (Main Day 2)

All (re)tweets containing #dockercon Wednesday 19th April 2017

All (re)tweets containing #dockercon Wednesday 19th April 2017

3.1.4 Day 4 - Thursday (Main Day 3)

All (re)tweets containing #dockercon Thursday 20th April 2017

All (re)tweets containing #dockercon Thursday 20th April 2017

3.2 Location (lat/long)

We wanted to make a nice map but sadly we see that most tweets have no lat/long set.

All logged lat/long values
latitude longitude nTweets
NA NA 17781
30.26416397 -97.73961067 2
30.26857 -97.73617 1
30.2625 -97.7401 29
30.26470908 -97.7417368 1
30.20226566 -97.66722505 1
42.36488267 -71.02168356 1
37.61697678 -122.38427689 1
30.2672 -97.7639 3
30.2635554 -97.7399303 1
30.2591 -97.7384 1
30.26622515 -97.74327721 1
30.26037 -97.73848 3
30.258201 -97.71264 1
30.25888 -97.73841 2
30.259714 -97.73940054 1
30.26006 -97.73813 1
30.26006 -97.73859 1
30.26036009 -97.73848483 1

3.3 Location (textual)

This appears to be pulled from the user’s profile although it may also be a ‘guestimate’ of current location.

Top locations for tweets:

Top 15 locations for tweeting
location nTweets
NA 2727
San Francisco, CA 1277
San Francisco 499
Austin, TX 329
Seattle, WA 226
Silicon Valley, CA 194
Paris 179
Islamabad, Pakistan 140
London 133
New York, NY 122
Charlotte, NC 119
San Jose, CA 114
Boston, MA 106
west tokyo 104
Boulder, CO 103

Top locations for tweeters:

Top 15 locations for tweeters
location nTweeters
NA 1106
San Francisco, CA 175
Austin, TX 86
San Francisco 61
Seattle, WA 49
New York, NY 41
Paris 41
San Jose, CA 40
London, England 34
Paris, France 33
London 31
Palo Alto, CA 29
New York 28
Boston, MA 27
France 27

3.4 Screen name

Next we’ll try by screen name.

Top tweeters:

Top 15 tweeters
screenName nTweets
DockerCon 325
theCUBE 156
climbingkujira 127
jpetazzo 124
BettyJunod 123
solomonstre 111
jeanepaul 104
ManoMarks 91
OpenShiftNinja 87
sitspak 80
kaslinfields 80
vmblog 78
SFoskett 78
jameskobielus 76
bsmith626 73

And here’s a really bad visualisation of all of them!

N tweets per 5 minutes by screen name

N tweets per 5 minutes by screen name

So let’s re-do that for the top 50 tweeters.

N tweets per 5 minutes by screen name (top 50, most prolific tweeters at bottom)

N tweets per 5 minutes by screen name (top 50, most prolific tweeters at bottom)

4 About

Analysis completed in: 41.08 seconds using knitr in RStudio with R version 3.3.3 (2017-03-06) running on x86_64-apple-darwin13.4.0.

A special mention must go to twitteR (Gentry, n.d.) for the twitter API interaction functions and lubridate (Grolemund and Wickham 2011) which allows timezone manipulation without tears.

Other R packages used:

  • base R - for the basics (R Core Team 2016)
  • data.table - for fast (big) data handling (Dowle et al. 2015)
  • readr - for nice data loading (Wickham, Hester, and Francois 2016)
  • ggplot2 - for slick graphs (Wickham 2009)
  • plotly - fancy, zoomable slick graphs (Sievert et al. 2016)
  • knitr - to create this document (Xie 2016)

References

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.